Improving Hungarian Text Categorization Using Domain-Specific Ontology

نویسندگان

  • István Pilászy
  • András Förhécz
چکیده

The aim of Text Categorization is to automatically assign documents to a set of predefined categories. The prevailing approach is making use of a collection of precategorized examples for the induction of a document classifier through learning methods. In this paper we introduce a method which combines state-of-the-art learning techniques with background knowledge. We have used KAON ontology for knowledge representation. We have developed a reasoning method which makes use of the relations in the ontology. Our experiments will show that the method substantially enhances the results of text categorization, it will be clear that a domain specific ontology can improve performance. The proposed method is applicable in the field of spam filtering, document reorganization and classifying news stories and e-mails.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text categorization using topic model and ontology networks

Text categorization based on pre-defined document categories is one of the most crucial tasks in text mining applications in recent decades. Successful text categorization highly relies on the text representations generated from documents. In this paper, an innovative text categorization model, VSM_WN_TM, is presented. VSM_WN_TM is a special Vector Space Model (VSM) that incorporates word frequ...

متن کامل

Text categorization using automatically acquired domain ontology

In this paper, we describe ontology-based text categorization in which the domain ontologies are automatically acquired through morphological rules and statistical methods. The ontology-based approach is a promising way for general information retrieval applications such as knowledge management or knowledge discovery. As a way to evaluate the quality of domain ontologies, we test our method thr...

متن کامل

Public Transport Ontology for Passenger Information Retrieval

Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...

متن کامل

Feature Generation for Text Categorization Using World Knowledge

We enhance machine learning algorithms for text categorization with generated features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text...

متن کامل

Domain Specific Named Entity Recognition (DSNER) from Web Documents

Named entity recognition is a tool, which use process natural language tasks such as, text categorization, speech translation, and document classification. The Web data promotes the idea, that more and more data can be interconnected. A step towards this goal is to bring more structured annotations to existing documents using common vocabularies or ontology. Semi-structured texts such as scient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011